Knowledge of language origin improves pronunciation accuracy of proper names

نویسندگان

Ariadna Font Llitjós

Alan W. Black

چکیده

As it is impossible to have a lexicon with complete coverage, and a high proportion of unknown words are proper names, this paper addresses the issue of automatically finding pronunciations of unseen proper names in US English. Proper names, especially in the US, may come from a large range of ethnic backgrounds. We present a model and results showing that including ethnic origin of words in a statistical model can improve pronunciation results. We used a lexicon of 56,000 proper names from CMUDICT. We also gathered data (text and proper names) from 26 languages to built statistical models that provide an estimate of word origin. Tests against held out data showed a 7.6% absolute improvement from a baseline of 54.8% when language based features were added to our CART-based model. As there are potentially multiple correct pronunciations, we synthesized a random sample of names that did not match the “correct” answer in our test set. Human listeners showed a 17% preference for the model with language features compared to the baseline.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving Pronunciation Accuracy of Proper Names with Language Origin Classes

Pronunciation of proper names that have different and varied language sources is an extremely hard task, even for humans. This thesis presents an attempt to improve automatic pronunciation of proper names by modeling the way humans do it, and tries to eliminate synthesis errors that humans would never make. It does so by taking into account the different language and language family sources and...

متن کامل

Basis Identification for Automatic Creation of Pronunciation Lexicon for Proper Names

Development of a proper names pronunciation lexicon is usually a manual effort which can not be avoided. Grapheme to phoneme (G2P) conversion modules, in literature, are usually rule based and work best for non-proper names in a particular language. Proper names are foreign to a G2P module. We follow an optimization approach to enable automatic construction of proper names pronunciation lexicon...

متن کامل

G2P Conversion of Proper Names Using Word Origin Information

Motivated by the fact that the pronunciation of a name may be influenced by its language of origin, we present methods to improve pronunciation prediction of proper names using word origin information. We train grapheme-to-phoneme (G2P) models on language-specific data sets and interpolate the outputs. We perform experiments on US surnames, a data set where word origin variation occurs naturall...

متن کامل

Generating proper name pro for automatic speech

Generating correct pronunciation of proper names remains one of the most difficult tasks in text-to-phoneme transcription. Although phonetic rules can be efficient in processing proper names of one language, foreign family names cannot be always correctly generated without additional pronunciation rules. The present study addresses the problem of pronunciation variants for French and foreign fa...

متن کامل

Proper Name Machine Translation from Japanese to Japanese Sign Language

This paper describes machine translation of proper names from Japanese to Japanese Sign Language (JSL). “Proper name transliteration” is a kind of machine translation of proper names between spoken languages and involves character-tocharacter conversion based on pronunciation. However, transliteration methods cannot be applied to Japanese-JSL machine translation because proper names in JSL are ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2001

Knowledge of language origin improves pronunciation accuracy of proper names

نویسندگان

چکیده

منابع مشابه

Improving Pronunciation Accuracy of Proper Names with Language Origin Classes

Basis Identification for Automatic Creation of Pronunciation Lexicon for Proper Names

G2P Conversion of Proper Names Using Word Origin Information

Generating proper name pro for automatic speech

Proper Name Machine Translation from Japanese to Japanese Sign Language

عنوان ژورنال:

اشتراک گذاری